Introduction

This project attempts to fulfill the assignment outlined in https://www.gastrograph.com/blogs/interviewing-data-science-interns.

“Using only partial least squares regression for generalized linear models (the plsRglm package in R), build an ensemble model to predict the quality score given to each wine from the Vinho Verde region of Portugal (see the data bullet in the Requirements section below to download the datasets). The data consists of chemical tests on wines and an overall assessment score averaged from at least three professional wine tasters. This is interesting data for AFS as the lack of consistent preferences among professional tasters is one of the reasons our company exists.”

“The rubric for assessment is explained in the Selection Criteria section below. The prediction model should be trained and will be tested on both red and white wine after joining the two datasets. We will split the supplied data into an 80% training set and a 20% hold-out validation set before running your training script with a random seed. We will use mean absolute error as a performance metric.”

Load necessary libraries

Several packages needed to be installed in order to use plsRglm. Other packages are loaded for data exploration, tidying, and analysis. The install commands are commented out rather than removed in case they need to be re-installed on another machine.

#install.packages("mvtnorm")
There were 23 warnings (use warnings() to see them)
#install.packages("bipartite")

# Package 'nloptr' must be installed like this: 
# Download https://github.com/stevengj/nlopt/archive/v2.7.1.tar.gz
# Extract contents, then enter the directory and type the following command:
# cmake . && make && sudo make install
# next type this command:
# sudo apt install libnlopt-dev
# Then you can do this:
#install.packages("nloptr")

# Package 'nloptr' is a prerequisite for 'car'
#install.packages("car")

# Packages 'mvtnorm', 'bipartite', and 'car' are prerequisites for 'plsRglm'.
#install.packages("plsRglm")
library("plsRglm") # See https://arxiv.org/abs/1810.01005 

#install.packages("tidyverse")
library("tidyverse") # for tidying up the data set
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5     ✓ purrr   0.3.4
✓ tibble  3.1.6     ✓ dplyr   1.0.8
✓ tidyr   1.2.0     ✓ stringr 1.4.0
✓ readr   2.1.2     ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
#install.packages("ggplot2")
library("ggplot2") # for visualizations
#install.packages("GGally")
library("GGally")
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
#install.packages('plsdof')
library('plsdof') # Necessary for "plsRglm" to work
Loading required package: MASS

Attaching package: ‘MASS’

The following object is masked from ‘package:dplyr’:

    select
#install.packages("moments")
library('moments') # For calculating skewness of distributions

#install.packages("Metrics")
library("Metrics") # For calculating MAE

library("stringr") # For regex

Read in the data

The data are stored in two files. One contains information on red wine and the other on white wine. In both cases, values are separated by semicolons. We will read in both files, add a variable describing the color of the wine, and then merge them into one dataframe.

# read csv's
df.red <- read.csv("data/winequality-red.csv", sep=';')
df.white <- read.csv("data/winequality-white.csv", sep=';')

# add categorical color variables to both sets
df.red['color'] <- 'red'
df.white['color'] <- 'white'

# merge datasets
df <- rbind(df.white, df.red)

The goal is to use these input variables:

1 - fixed.acidity

2 - volatile.acidity

3 - citric.acid

4 - residual.sugar

5 - chlorides

6 - free.sulfur.dioxide

7 - total.sulfur.dioxide

8 - density

9 - pH

10 - sulphates

11 - alcohol

13 - color

To predict this output variable:

12 - quality (score between 0 and 10)

Explore the data

Let’s print out a few things to get an idea of what the data look like:

str(df)
'data.frame':   6497 obs. of  13 variables:
 $ fixed.acidity       : num  7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
 $ volatile.acidity    : num  0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
 $ citric.acid         : num  0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
 $ residual.sugar      : num  20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
 $ chlorides           : num  0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
 $ free.sulfur.dioxide : num  45 14 30 47 47 30 30 45 14 28 ...
 $ total.sulfur.dioxide: num  170 132 97 186 186 97 136 170 132 129 ...
 $ density             : num  1.001 0.994 0.995 0.996 0.996 ...
 $ pH                  : num  3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
 $ sulphates           : num  0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
 $ alcohol             : num  8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
 $ quality             : int  6 6 6 6 6 6 6 6 6 6 ...
 $ color               : chr  "white" "white" "white" "white" ...
head(df)
tail(df)
summary(df)
 fixed.acidity    volatile.acidity  citric.acid     residual.sugar     chlorides      
 Min.   : 3.800   Min.   :0.0800   Min.   :0.0000   Min.   : 0.600   Min.   :0.00900  
 1st Qu.: 6.400   1st Qu.:0.2300   1st Qu.:0.2500   1st Qu.: 1.800   1st Qu.:0.03800  
 Median : 7.000   Median :0.2900   Median :0.3100   Median : 3.000   Median :0.04700  
 Mean   : 7.215   Mean   :0.3397   Mean   :0.3186   Mean   : 5.443   Mean   :0.05603  
 3rd Qu.: 7.700   3rd Qu.:0.4000   3rd Qu.:0.3900   3rd Qu.: 8.100   3rd Qu.:0.06500  
 Max.   :15.900   Max.   :1.5800   Max.   :1.6600   Max.   :65.800   Max.   :0.61100  
 free.sulfur.dioxide total.sulfur.dioxide    density             pH          sulphates     
 Min.   :  1.00      Min.   :  6.0        Min.   :0.9871   Min.   :2.720   Min.   :0.2200  
 1st Qu.: 17.00      1st Qu.: 77.0        1st Qu.:0.9923   1st Qu.:3.110   1st Qu.:0.4300  
 Median : 29.00      Median :118.0        Median :0.9949   Median :3.210   Median :0.5100  
 Mean   : 30.53      Mean   :115.7        Mean   :0.9947   Mean   :3.219   Mean   :0.5313  
 3rd Qu.: 41.00      3rd Qu.:156.0        3rd Qu.:0.9970   3rd Qu.:3.320   3rd Qu.:0.6000  
 Max.   :289.00      Max.   :440.0        Max.   :1.0390   Max.   :4.010   Max.   :2.0000  
    alcohol         quality         color          
 Min.   : 8.00   Min.   :3.000   Length:6497       
 1st Qu.: 9.50   1st Qu.:5.000   Class :character  
 Median :10.30   Median :6.000   Mode  :character  
 Mean   :10.49   Mean   :5.818                     
 3rd Qu.:11.30   3rd Qu.:6.000                     
 Max.   :14.90   Max.   :9.000                     

Let’s now start exploring the data one variable at a time. I’ve created a function below that will make a plot of two overlapping transparent histograms. It will be useful to compare the distributions for red wine and white wine.

plot_multi_histogram <- function(df, feature, label.column, alpha, binwidth) {
  plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label.column)))) +
    geom_histogram(alpha=alpha, position="identity", color="black", binwidth=binwidth) +
  #geom_density(alpha=alpha)# +
  #geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1)# +
  labs(x=feature, y = "Density")
  plt + guides(fill=guide_legend(title=label.column))
}

fixed.acidity

#   1 - fixed acidity
summary(df$fixed.acidity)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.800   6.400   7.000   7.215   7.700  15.900 
#table(df$fixed.acidity)
#sort(unique(df$fixed.acidity))
#ggplot(data = df, aes(x=fixed.acidity, fill=color)) + geom_histogram(alpha=0.2, position="identity")
plot_multi_histogram(df, "fixed.acidity", "color", alpha=0.4, binwidth=1)

volatile.acidity

#   2 - volatile acidity
summary(df$volatile.acidity)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0800  0.2300  0.2900  0.3397  0.4000  1.5800 
plot_multi_histogram(df, "volatile.acidity", "color", alpha=0.4, binwidth=0.1)

citric.acid

#   3 - citric acid
summary(df$citric.acid)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.2500  0.3100  0.3186  0.3900  1.6600 
plot_multi_histogram(df, "citric.acid", "color", alpha=0.4, binwidth=0.1)

residual.sugar

#   4 - residual sugar
summary(df$residual.sugar)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.600   1.800   3.000   5.443   8.100  65.800 
plot_multi_histogram(df, "residual.sugar", "color", alpha=0.4, binwidth=2)

chlorides

#   5 - chlorides
summary(df$chlorides)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00900 0.03800 0.04700 0.05603 0.06500 0.61100 
plot_multi_histogram(df, "chlorides", "color", alpha=0.4, binwidth=0.01)

free.sulfur.dioxide

#   6 - free sulfur dioxide
summary(df$free.sulfur.dioxide)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   17.00   29.00   30.53   41.00  289.00 
plot_multi_histogram(df, "free.sulfur.dioxide", "color", alpha=0.4, binwidth=10)

total.sulfur.dioxide

#   7 - total sulfur dioxide
summary(df$total.sulfur.dioxide)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    6.0    77.0   118.0   115.7   156.0   440.0 
plot_multi_histogram(df, "total.sulfur.dioxide", "color", alpha=0.4, binwidth=10)

density

#   8 - density
summary(df$density)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9871  0.9923  0.9949  0.9947  0.9970  1.0390 
plot_multi_histogram(df, "density", "color", alpha=0.4, binwidth=0.001)

pH

#   9 - pH
summary(df$pH)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.720   3.110   3.210   3.219   3.320   4.010 
plot_multi_histogram(df, "pH", "color", alpha=0.4, binwidth=0.05)

sulphates

#   10 - sulphates
summary(df$sulphates)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2200  0.4300  0.5100  0.5313  0.6000  2.0000 
plot_multi_histogram(df, "sulphates", "color", alpha=0.4, binwidth=0.1)

alcohol

#   11 - alcohol
summary(df$alcohol)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8.00    9.50   10.30   10.49   11.30   14.90 
plot_multi_histogram(df, "alcohol", "color", alpha=0.4, binwidth=0.2)

quality

#   12 - quality (score between 0 and 10)
summary(df$quality)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   5.000   6.000   5.818   6.000   9.000 
plot_multi_histogram(df, "quality", "color", alpha=0.4, binwidth=1)

By looking at the data, we have learned the following:

There is more data on white wine than red wine, so we may need to rescale the data so that the model learns to classify red wine as well as white wine.

Several of the distributions are skewed, indicating the presence of long tails or outliers.

Now let’s look for correlations between features in the data with a ggpairs plot.

ggpairs(df, columns=1:12, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))

The highest correlations we see are the following:

0.721 between free.sulfur.dioxide and total.sulfur.dioxide

0.687 between density and alcohol

0.553 between density and residual.sugar

0.495 between total.sulfur.dioxide and residual.sugar

0.459 between density and fixed.acidity

0.444 between alcohol and quality

0.414 between total.sulfur.dioxide and volatile.acidity

0.403 between free.sulfur.dioxide and residual.sugar

There is also a correlation between pH and volatile.acidity, but it’s different between red wine and white wine.

Cleary the alcohol content will be useful for our model because of its relatively high correlation with the wine quality rating (0.444).

We also see some outliers. Let’s take a closer look at some of the plots to figure out where these outliers are and if they ought to be removed. The most obvious outlier is the one in residual.sugar and density.

ggplot(df, aes(x=residual.sugar, y=density, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")

The maximum has over twice as much sugar as the next largest point. It is also twice as far from the minimum density as the next most dense point. It is clearly an outlier and I believe it is safe to remove it from the data. We’ll do this by cutting out all points with residual.sugar > 40. The next largest point is also separated from most of the other points, but more so in density than in residual.sugar. The density is correlated with both alcohol (0.687) and residual.sugar (0.553). I think we can assume that much of the information about the density is included already in the alcohol and sugar variables, so we can remove density from the dataset.

Another outlier is in free.sulfur.dioxide and total.sulfur.dioxide.

ggplot(df, aes(x=free.sulfur.dioxide, y=total.sulfur.dioxide, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")

This point has almost twice as much free sulfur dioxide as the next largest point. However, it is not much of an outlier on the other axis, total.sulfur.dioxide. It also has the minimum quality rating, which means it could have useful predictive power. There is a strong correlation between these two variables (0.721), which means it may be better to simply remove the variable free.sulfur.dioxide from the dataset rather than remove this data point.

Let’s apply the changes (remove sugar outlier & remove free.sulfur.dioxide (6) and density (8)).

df.cut <- df[df$residual.sugar < 40, c(1:5, 7, 9:13)]

Now we’ll redraw the plots.

ggpairs(df.cut, columns=1:10, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))

There are still some points that tend to isolate themselves from most of the data, but not to such a degree as the outlier that we removed. They seem to simply follow the shape of a long-tailed distribution. This calls for a transformation of the data to make the distributions more normal.

Let’s begin the transformations by first calculating the skewness of the variables.

for (name in names(df.cut[1:9])) {
  skew <- skewness(df.cut[name], na.rm = TRUE)
  print(paste(name,skew))
}
[1] "fixed.acidity 1.72303425219973"
[1] "volatile.acidity 1.49292242490666"
[1] "citric.acid 0.47177011412896"
[1] "residual.sugar 1.1695985102309"
[1] "chlorides 5.39870943969405"
[1] "total.sulfur.dioxide -0.000889571569577466"
[1] "pH 0.387127126904609"
[1] "sulphates 1.79749476995869"
[1] "alcohol 0.565985435318878"

We now create a copy of the dataframe and apply transformations. The functions 1/x, sqrt, and log10 are useful for transforming positively skewed distributions into a more gaussian shape, depending on the severity of the skew. The following transformations give decent distributions.

df.transformed <- data.frame(df.cut)
df.transformed$fixed.acidity <- 1/(df.transformed$fixed.acidity)
df.transformed$sulphates <- 1/(df.transformed$sulphates)
df.transformed$volatile.acidity <- sqrt(1/(df.transformed$volatile.acidity))
df.transformed$residual.sugar <- log10(df.transformed$residual.sugar)
df.transformed$chlorides <- sqrt(1/(df.transformed$chlorides))
df.transformed$citric.acid <- sqrt(df.transformed$citric.acid + 0.1)
df.transformed$total.sulfur.dioxide <- sqrt(df.transformed$total.sulfur.dioxide)

Let’s plot a few histograms to see how normal the new distributions are.

plot_multi_histogram(df.transformed, "volatile.acidity", "color", alpha=0.4, binwidth=0.1)

plot_multi_histogram(df.transformed, "residual.sugar", "color", alpha=0.4, binwidth=0.1)

plot_multi_histogram(df.transformed, "chlorides", "color", alpha=0.4, binwidth=2)

plot_multi_histogram(df.transformed, "citric.acid", "color", alpha=0.4, binwidth=0.05)

We can now calculate the skewness again to see if it has improved.

for (name in names(df.transformed[1:9])) {
  skew <- skewness(df.transformed[name], na.rm = TRUE)
  print(paste(name,skew))
}
[1] "fixed.acidity -0.158566250269554"
[1] "volatile.acidity 0.248389283010552"
[1] "citric.acid -0.444243638771569"
[1] "residual.sugar 0.237148189575897"
[1] "chlorides 0.115471194185833"
[1] "total.sulfur.dioxide -0.652244809027631"
[1] "pH 0.387127126904609"
[1] "sulphates 0.441836626677462"
[1] "alcohol 0.565985435318878"

We can also plot a few scatter plots to look at two variables at once, to see how the distributions changed after the transformations and look for any more correlations. We can color-code the quality score.

ggplot(df.cut, aes(x=fixed.acidity, y=sulphates, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")

ggplot(df.transformed, aes(x=fixed.acidity, y=sulphates, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")

In this particular case it looks like the data have become more normally distributed. Let’s look at the full correlation plot again because it gives a good view of the entire dataset all at once.

ggpairs(df.transformed, columns=1:10, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))

The data look very good now. However, there does not seem to be any easy way to predict the quality of the wine. None of the variables have a very high correlation with quality. The colored scatter plots that we have looked at do not have any obvious clustering of quality scores. Any significant correlations with quality must reside in combinations of more that two features, in a very nonlinear way. This makes the required application of a linear model unusual, but let’s see what we can do.

Let’s center and scale the data, and then prepare a model.

# center and scale the data
df.transformed.scaled <- data.frame(df.transformed)
for (feature in names(df.transformed.scaled[1:9])) {
  df.transformed.scaled[feature] <- scale(df.transformed.scaled[,feature], center = TRUE, scale = TRUE)
}
head(df.transformed.scaled)

First model attempt

The data appear to be ready for the application of a simple test model to see if we can get the plsRglm package working correctly. Let’s transform the color variable into numeric factors and set up our X variables and Y target.

dataY <- factor(df.transformed.scaled$quality, ordered=TRUE)
dataX <- df.transformed.scaled[1:9]
df.transformed.scaled$color <- factor(df.transformed.scaled$color)
dataX['color'] <- as.numeric(df.transformed.scaled$color)

As stated previously, the data ought to be weighted by color so that the model isn’t biased towards white wine. We want it to learn to accurately predict the quality of either color. In this case, the difference in size of the two groups is not extreme, but it is still good practice to be aware of and take care of these issues.

weight.factor <- nrow(df.red)/nrow(df.white)
get_weight <- function(color.name) {
  if (color.name == "white") {
    return(weight.factor)
  }
  return(1.0)
}
prior.weights <- sapply(df.transformed.scaled$color, get_weight)

We now build a model using the required package plsRglm. See https://cran.r-project.org/web/packages/plsRglm/vignettes/plsRglm.pdf for more information on this package and specific applications. More information about the arguments can be found at https://www.rdocumentation.org/packages/plsRglm/versions/1.3.0/topics/plsRglm. We will train on 80% of the data and save 20% for testing later.

N.rows <- nrow(dataX)
train.fraction <- 0.8
N.train.rows <- floor(N.rows*train.fraction)
N.test.rows <- N.rows - N.train.rows
test.indices <- sample.int(N.rows, N.test.rows) # select N_test_rows randomly
dataX.test <- dataX[c(test.indices),]
dataX.train <- dataX[-c(test.indices),]
dataY.test <- dataY[c(test.indices)]
dataY.train <- dataY[-c(test.indices)]
prior.weights.test <- prior.weights[c(test.indices)]
prior.weights.train <- prior.weights[-c(test.indices)]
set.seed(123)
model.pls <- plsRglm(dataY.train,dataX.train,nt=10,limQ2set=.0975,
             dataPredictY=dataX.train,modele="pls-glm-polr",family=NULL,typeVC="none",
             EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
             alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train,
             sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
____************************************************____
Only naive DoF can be used with PLS GLM
Only naive DoF can be used with weighted PLS

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
____Component____ 3 ____
____Component____ 4 ____
____Component____ 5 ____
No more significant predictors (<0.05) found
Warning only 5 components were thus extracted
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
model.pls
Number of required components:
[1] 10
Number of successfully computed components:
[1] 5
Coefficients:
                            [,1]
3|4                  -7.24351760
4|5                  -5.12647532
5|6                  -2.00557605
6|7                   0.59647393
7|8                   2.89485747
8|9                   6.62775696
fixed.acidity        -0.05776754
volatile.acidity     -0.69285102
citric.acid          -0.05134886
residual.sugar       -0.34203406
chlorides            -0.19977215
total.sulfur.dioxide  0.07263541
pH                   -0.01807064
sulphates             0.21915055
alcohol              -1.03638653
color                 0.71768891
Information criteria and Fit statistics:
               AIC      BIC Missclassed Chi2_Pearson_Y
Nb_Comp_0 13215.08 13254.41        2919       6661.055
Nb_Comp_1 11912.82 11958.71        2583       5122.532
Nb_Comp_2 11610.03 11662.48        2448       4836.552
Nb_Comp_3 11469.37 11528.37        2424       4685.632
Nb_Comp_4 11419.96 11485.51        2390       4649.034
Nb_Comp_5 11415.20 11487.31        2373       4632.419

Let’s take a look at how well the model performs in predicting the quality of wines. We compare the predictions and observations with a table, a plot, and a calculation of the Mean Absolute Error (MAE).

predictions.train <- as.numeric(predict(model.pls$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.pls, newdata = dataX.test, type="class"))
observations.test <- as.numeric(dataY.test)
table(as.numeric(unlist(dataY.train)), predictions.train )
   predictions.train
       3    4    5    6
  1   14   12    0    0
  2  101   69    4    0
  3 1040  671   14    1
  4  504 1638  135    0
  5   35  657  145    1
  6    8  106   37    0
  7    0    3    1    0
table(as.numeric(unlist(dataY.test)), predictions.test )
   predictions.test
      3   4   5   6
  1   4   0   0   0
  2  23  19   0   0
  3 251 158   3   0
  4 115 418  25   0
  5  11 181  48   1
  6   2  28  12   0
  7   0   1   0   0
plot(observations.test, predictions.test)

mae(observations.train, predictions.train)
[1] 0.5134719
mae(observations.test, predictions.test)
[1] 0.5030769

This is not bad. It’s much better than randomly guessing numbers between 3 and 7, in which case I would expect an mae of about 1.8. But still, maybe we can do better.

A cross validation experiment demonstrates that 10 is an appropriate number of components for this model, given the training data. However, this takes a very long time to run. So I’m excluding it from this notebook.

#cv.modpls <- cv.plsRglm(dataY.train,dataX.train,nt=10,modele="pls-glm-polr",NK=5)
#cv.modpls
#res.cv.modpls=cvtable(summary(cv.modpls, MClassed = TRUE))
#plot(res.cv.modpls) # This suggests that 10 is the preferred number of components for the model.

Ensemble

Let’s try to improve the model by implementing an ensemble. Linear regression models are very robust and tend to get the same result every time, so making the same model again and again and averaging the results won’t do any good, unlike a random forest which can create thousands of different trees by simply changing the random see. Instead of bagging models together, let’s try a boosting technique. The first model will use the features to try and predict the quality score. The second model will use the features and the 1st model’s score to try and predict the true quality score, thus fitting to the difference, and so on.

dataX.train.2 <- data.frame(dataX.train)
dataX.train.2["prediction1"] <- as.numeric( predict(model.pls$FinalModel) )
dataX.test.2 <- data.frame(dataX.test)
dataX.test.2["prediction1"] <- as.numeric( predict(model.pls, newdata = dataX.test, type="class") )
model.pls.2 <- plsRglm(dataY.train,dataX.train.2,nt=10,limQ2set=.0975,
                     dataPredictY=dataX.train.2,modele="pls-glm-polr",family=NULL,typeVC="none",
                     EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                     alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train)
____************************************************____
Only naive DoF can be used with weighted PLS

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
____Component____ 3 ____
____Component____ 4 ____
____Component____ 5 ____
____Component____ 6 ____
____Component____ 7 ____
____Component____ 8 ____
____Component____ 9 ____
____Component____ 10 ____
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
model.pls.2
Number of required components:
[1] 10
Number of successfully computed components:
[1] 10
Coefficients:
                            [,1]
3|4                  -7.24189698
4|5                  -5.12747456
5|6                  -2.00859814
6|7                   0.59434766
7|8                   2.89427929
8|9                   6.62714511
fixed.acidity        -0.06136429
volatile.acidity     -0.68364370
citric.acid          -0.04005226
residual.sugar       -0.33131024
chlorides            -0.18772719
total.sulfur.dioxide  0.04089836
pH                    0.01893711
sulphates             0.21759147
alcohol              -1.04804010
color                 0.76629508
prediction1          -0.02190571
Information criteria and Fit statistics:
                AIC      BIC Missclassed Chi2_Pearson_Y
Nb_Comp_0  13215.08 13254.41        2919       6661.055
Nb_Comp_1  11711.66 11757.55        2379       4660.746
Nb_Comp_2  11581.03 11633.48        2373       4364.090
Nb_Comp_3  11477.39 11536.39        2378       4385.356
Nb_Comp_4  11416.54 11482.10        2392       4652.114
Nb_Comp_5  11414.01 11486.12        2375       4615.955
Nb_Comp_6  11415.47 11494.14        2375       4625.482
Nb_Comp_7  11417.35 11502.58        2374       4618.055
Nb_Comp_8  11419.34 11511.12        2375       4620.307
Nb_Comp_9  11421.34 11519.67        2375       4620.565
Nb_Comp_10 11423.34 11528.23        2374       4620.568

We now calculate the MAE to see how well the second model performed.

predictions.train <- as.numeric(predict(model.pls.2$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.pls.2, newdata = dataX.test.2, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
[1] 0.5138568
mae(observations.test, predictions.test)
[1] 0.5015385

Boosting in this manner offered no significant improvement in the MAE.

We could try a boosted ensemble where we just do one variable at a time and then fit the prediction error with each successive model, but I think that would be just like multivariate regression, but not as good.

Let’s continue trying a boosted ensemble with all of the variables, including the result of the previous model to make the next predictions. But instead of trying to fit the quality values at each step, we can try to fit the difference between the quality values and the previous prediction. Let’s try it.

number.of.models <- 3
There were 24 warnings (use warnings() to see them)
models.pls <- vector(mode = "list", length = number.of.models) # empty list of models
errors <- vector(mode = "list", length = number.of.models) # empty list of errors
predictions <- as.numeric(dataY.train)*0.0
targets <- as.numeric(dataY.train)
for (i in 1:number.of.models) {
  dataX.this <- data.frame(dataX.train)
  if (i > 1) {
    dataX.this["prediction1"] <- predictions
  }
  targets <- targets - predictions
  model.this <- plsRglm(targets,dataX.this,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.this,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train,
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
  #model_this
  predictions <- as.numeric(predict(model.this$FinalModel))
  observations <- as.numeric(targets)
  errors[i] <- mae(observations, predictions)
  models.pls[i] <- model.this
  print(errors[i])
}
____************************************************____
Only naive DoF can be used with PLS GLM
Only naive DoF can be used with weighted PLS

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
____Component____ 3 ____
____Component____ 4 ____
____Component____ 5 ____
No more significant predictors (<0.05) found
Warning only 5 components were thus extracted
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
number of items to replace is not a multiple of replacement length
[[1]]
[1] 0.5134719

____************************************************____
Only naive DoF can be used with PLS GLM
Only naive DoF can be used with weighted PLS

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
____Component____ 3 ____
____Component____ 4 ____
____Component____ 5 ____
____Component____ 6 ____
No more significant predictors (<0.05) found
Warning only 6 components were thus extracted
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
number of items to replace is not a multiple of replacement length
[[1]]
[1] 3.951501

____************************************************____
Only naive DoF can be used with PLS GLM
Only naive DoF can be used with weighted PLS

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
No more significant predictors (<0.05) found
Warning only 2 components were thus extracted
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
number of items to replace is not a multiple of replacement length
[[1]]
[1] 7.93418

This is resulting in models that get progressively worse. I see now that I was on the wrong track. The data are too noisy, so by boosting a noisy model I just get an even noisier model. What I can do is bag the models instead, which is what I didn’t want to do at first because every model would be the same, but actually that’s only true if I use the same data each time. I could split up the dataset into N chunks and fit each one separately, and then I could use the results all together for the final prediction. But that’s basically just multivariate linear regression on the full dataset, only worse.

Feature engineering cross-terms for effective polynomial fitting

Let’s try another approach. Since the plsRglm model is designed to handle high dimensional data with few rows, I’m going to increase the dimensionality of the dataset. This will allow previously nonlinear quadratic properties to be described linearly by the plsRglm model.

If I think that the relationship between the data and quality is nonlinear then I just need to create nonlinear features. If I want xy in the model instead of just ax + by I can make a column xy by multiplying two columns together. I’ll multiply every continuous feature by every other continuous feature to get new quadratic columns.

dataX.nonlinear <- data.frame(dataX)
for (feature1 in names(dataX.nonlinear[1:9])) {
  for (feature2 in names(dataX.nonlinear[1:9])) {
    new.feature <- paste(substring(feature1,1,2), substring(feature2,1,2), sep=".")
    dataX.nonlinear[new.feature] <- dataX.nonlinear[feature1]*dataX.nonlinear[feature2]
  }
}
names(dataX.nonlinear)
 [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
 [4] "residual.sugar"       "chlorides"            "total.sulfur.dioxide"
 [7] "pH"                   "sulphates"            "alcohol"             
[10] "color"                "fi.fi"                "fi.vo"               
[13] "fi.ci"                "fi.re"                "fi.ch"               
[16] "fi.to"                "fi.pH"                "fi.su"               
[19] "fi.al"                "vo.fi"                "vo.vo"               
[22] "vo.ci"                "vo.re"                "vo.ch"               
[25] "vo.to"                "vo.pH"                "vo.su"               
[28] "vo.al"                "ci.fi"                "ci.vo"               
[31] "ci.ci"                "ci.re"                "ci.ch"               
[34] "ci.to"                "ci.pH"                "ci.su"               
[37] "ci.al"                "re.fi"                "re.vo"               
[40] "re.ci"                "re.re"                "re.ch"               
[43] "re.to"                "re.pH"                "re.su"               
[46] "re.al"                "ch.fi"                "ch.vo"               
[49] "ch.ci"                "ch.re"                "ch.ch"               
[52] "ch.to"                "ch.pH"                "ch.su"               
[55] "ch.al"                "to.fi"                "to.vo"               
[58] "to.ci"                "to.re"                "to.ch"               
[61] "to.to"                "to.pH"                "to.su"               
[64] "to.al"                "pH.fi"                "pH.vo"               
[67] "pH.ci"                "pH.re"                "pH.ch"               
[70] "pH.to"                "pH.pH"                "pH.su"               
[73] "pH.al"                "su.fi"                "su.vo"               
[76] "su.ci"                "su.re"                "su.ch"               
[79] "su.to"                "su.pH"                "su.su"               
[82] "su.al"                "al.fi"                "al.vo"               
[85] "al.ci"                "al.re"                "al.ch"               
[88] "al.to"                "al.pH"                "al.su"               
[91] "al.al"               

I should also separate the red and white wine features, because they appear to be significantly different.

for (feature in names(dataX.nonlinear[c(1:9, 11:91)])) {
  
  r.feature <- paste("r", feature, sep=".")
  w.feature <- paste("w", feature, sep=".")
  dataX.nonlinear[r.feature] <- with(dataX.nonlinear, 
                                        ifelse(color == 1,
                                               eval(parse(text=feature)),
                                               0.0)
                                        )
  dataX.nonlinear[w.feature] <- with(dataX.nonlinear, 
                                        ifelse(color == 2,
                                               eval(parse(text=feature)),
                                               0.0)
                                        )
}

Let’s look at the new dataframe features.

names(dataX.nonlinear)
  [1] "fixed.acidity"          "volatile.acidity"       "citric.acid"           
  [4] "residual.sugar"         "chlorides"              "total.sulfur.dioxide"  
  [7] "pH"                     "sulphates"              "alcohol"               
 [10] "color"                  "fi.fi"                  "fi.vo"                 
 [13] "fi.ci"                  "fi.re"                  "fi.ch"                 
 [16] "fi.to"                  "fi.pH"                  "fi.su"                 
 [19] "fi.al"                  "vo.fi"                  "vo.vo"                 
 [22] "vo.ci"                  "vo.re"                  "vo.ch"                 
 [25] "vo.to"                  "vo.pH"                  "vo.su"                 
 [28] "vo.al"                  "ci.fi"                  "ci.vo"                 
 [31] "ci.ci"                  "ci.re"                  "ci.ch"                 
 [34] "ci.to"                  "ci.pH"                  "ci.su"                 
 [37] "ci.al"                  "re.fi"                  "re.vo"                 
 [40] "re.ci"                  "re.re"                  "re.ch"                 
 [43] "re.to"                  "re.pH"                  "re.su"                 
 [46] "re.al"                  "ch.fi"                  "ch.vo"                 
 [49] "ch.ci"                  "ch.re"                  "ch.ch"                 
 [52] "ch.to"                  "ch.pH"                  "ch.su"                 
 [55] "ch.al"                  "to.fi"                  "to.vo"                 
 [58] "to.ci"                  "to.re"                  "to.ch"                 
 [61] "to.to"                  "to.pH"                  "to.su"                 
 [64] "to.al"                  "pH.fi"                  "pH.vo"                 
 [67] "pH.ci"                  "pH.re"                  "pH.ch"                 
 [70] "pH.to"                  "pH.pH"                  "pH.su"                 
 [73] "pH.al"                  "su.fi"                  "su.vo"                 
 [76] "su.ci"                  "su.re"                  "su.ch"                 
 [79] "su.to"                  "su.pH"                  "su.su"                 
 [82] "su.al"                  "al.fi"                  "al.vo"                 
 [85] "al.ci"                  "al.re"                  "al.ch"                 
 [88] "al.to"                  "al.pH"                  "al.su"                 
 [91] "al.al"                  "r.fixed.acidity"        "w.fixed.acidity"       
 [94] "r.volatile.acidity"     "w.volatile.acidity"     "r.citric.acid"         
 [97] "w.citric.acid"          "r.residual.sugar"       "w.residual.sugar"      
[100] "r.chlorides"            "w.chlorides"            "r.total.sulfur.dioxide"
[103] "w.total.sulfur.dioxide" "r.pH"                   "w.pH"                  
[106] "r.sulphates"            "w.sulphates"            "r.alcohol"             
[109] "w.alcohol"              "r.fi.fi"                "w.fi.fi"               
[112] "r.fi.vo"                "w.fi.vo"                "r.fi.ci"               
[115] "w.fi.ci"                "r.fi.re"                "w.fi.re"               
[118] "r.fi.ch"                "w.fi.ch"                "r.fi.to"               
[121] "w.fi.to"                "r.fi.pH"                "w.fi.pH"               
[124] "r.fi.su"                "w.fi.su"                "r.fi.al"               
[127] "w.fi.al"                "r.vo.fi"                "w.vo.fi"               
[130] "r.vo.vo"                "w.vo.vo"                "r.vo.ci"               
[133] "w.vo.ci"                "r.vo.re"                "w.vo.re"               
[136] "r.vo.ch"                "w.vo.ch"                "r.vo.to"               
[139] "w.vo.to"                "r.vo.pH"                "w.vo.pH"               
[142] "r.vo.su"                "w.vo.su"                "r.vo.al"               
[145] "w.vo.al"                "r.ci.fi"                "w.ci.fi"               
[148] "r.ci.vo"                "w.ci.vo"                "r.ci.ci"               
[151] "w.ci.ci"                "r.ci.re"                "w.ci.re"               
[154] "r.ci.ch"                "w.ci.ch"                "r.ci.to"               
[157] "w.ci.to"                "r.ci.pH"                "w.ci.pH"               
[160] "r.ci.su"                "w.ci.su"                "r.ci.al"               
[163] "w.ci.al"                "r.re.fi"                "w.re.fi"               
[166] "r.re.vo"                "w.re.vo"                "r.re.ci"               
[169] "w.re.ci"                "r.re.re"                "w.re.re"               
[172] "r.re.ch"                "w.re.ch"                "r.re.to"               
[175] "w.re.to"                "r.re.pH"                "w.re.pH"               
[178] "r.re.su"                "w.re.su"                "r.re.al"               
[181] "w.re.al"                "r.ch.fi"                "w.ch.fi"               
[184] "r.ch.vo"                "w.ch.vo"                "r.ch.ci"               
[187] "w.ch.ci"                "r.ch.re"                "w.ch.re"               
[190] "r.ch.ch"                "w.ch.ch"                "r.ch.to"               
[193] "w.ch.to"                "r.ch.pH"                "w.ch.pH"               
[196] "r.ch.su"                "w.ch.su"                "r.ch.al"               
[199] "w.ch.al"                "r.to.fi"                "w.to.fi"               
[202] "r.to.vo"                "w.to.vo"                "r.to.ci"               
[205] "w.to.ci"                "r.to.re"                "w.to.re"               
[208] "r.to.ch"                "w.to.ch"                "r.to.to"               
[211] "w.to.to"                "r.to.pH"                "w.to.pH"               
[214] "r.to.su"                "w.to.su"                "r.to.al"               
[217] "w.to.al"                "r.pH.fi"                "w.pH.fi"               
[220] "r.pH.vo"                "w.pH.vo"                "r.pH.ci"               
[223] "w.pH.ci"                "r.pH.re"                "w.pH.re"               
[226] "r.pH.ch"                "w.pH.ch"                "r.pH.to"               
[229] "w.pH.to"                "r.pH.pH"                "w.pH.pH"               
[232] "r.pH.su"                "w.pH.su"                "r.pH.al"               
[235] "w.pH.al"                "r.su.fi"                "w.su.fi"               
[238] "r.su.vo"                "w.su.vo"                "r.su.ci"               
[241] "w.su.ci"                "r.su.re"                "w.su.re"               
[244] "r.su.ch"                "w.su.ch"                "r.su.to"               
[247] "w.su.to"                "r.su.pH"                "w.su.pH"               
[250] "r.su.su"                "w.su.su"                "r.su.al"               
[253] "w.su.al"                "r.al.fi"                "w.al.fi"               
[256] "r.al.vo"                "w.al.vo"                "r.al.ci"               
[259] "w.al.ci"                "r.al.re"                "w.al.re"               
[262] "r.al.ch"                "w.al.ch"                "r.al.to"               
[265] "w.al.to"                "r.al.pH"                "w.al.pH"               
[268] "r.al.su"                "w.al.su"                "r.al.al"               
[271] "w.al.al"               

Now let’s remove all the columns that don’t start with r. or w.

dataX.nonlinear.final <- dataX.nonlinear[c(92:271)]
names(dataX.nonlinear.final)
  [1] "r.fixed.acidity"        "w.fixed.acidity"        "r.volatile.acidity"    
  [4] "w.volatile.acidity"     "r.citric.acid"          "w.citric.acid"         
  [7] "r.residual.sugar"       "w.residual.sugar"       "r.chlorides"           
 [10] "w.chlorides"            "r.total.sulfur.dioxide" "w.total.sulfur.dioxide"
 [13] "r.pH"                   "w.pH"                   "r.sulphates"           
 [16] "w.sulphates"            "r.alcohol"              "w.alcohol"             
 [19] "r.fi.fi"                "w.fi.fi"                "r.fi.vo"               
 [22] "w.fi.vo"                "r.fi.ci"                "w.fi.ci"               
 [25] "r.fi.re"                "w.fi.re"                "r.fi.ch"               
 [28] "w.fi.ch"                "r.fi.to"                "w.fi.to"               
 [31] "r.fi.pH"                "w.fi.pH"                "r.fi.su"               
 [34] "w.fi.su"                "r.fi.al"                "w.fi.al"               
 [37] "r.vo.fi"                "w.vo.fi"                "r.vo.vo"               
 [40] "w.vo.vo"                "r.vo.ci"                "w.vo.ci"               
 [43] "r.vo.re"                "w.vo.re"                "r.vo.ch"               
 [46] "w.vo.ch"                "r.vo.to"                "w.vo.to"               
 [49] "r.vo.pH"                "w.vo.pH"                "r.vo.su"               
 [52] "w.vo.su"                "r.vo.al"                "w.vo.al"               
 [55] "r.ci.fi"                "w.ci.fi"                "r.ci.vo"               
 [58] "w.ci.vo"                "r.ci.ci"                "w.ci.ci"               
 [61] "r.ci.re"                "w.ci.re"                "r.ci.ch"               
 [64] "w.ci.ch"                "r.ci.to"                "w.ci.to"               
 [67] "r.ci.pH"                "w.ci.pH"                "r.ci.su"               
 [70] "w.ci.su"                "r.ci.al"                "w.ci.al"               
 [73] "r.re.fi"                "w.re.fi"                "r.re.vo"               
 [76] "w.re.vo"                "r.re.ci"                "w.re.ci"               
 [79] "r.re.re"                "w.re.re"                "r.re.ch"               
 [82] "w.re.ch"                "r.re.to"                "w.re.to"               
 [85] "r.re.pH"                "w.re.pH"                "r.re.su"               
 [88] "w.re.su"                "r.re.al"                "w.re.al"               
 [91] "r.ch.fi"                "w.ch.fi"                "r.ch.vo"               
 [94] "w.ch.vo"                "r.ch.ci"                "w.ch.ci"               
 [97] "r.ch.re"                "w.ch.re"                "r.ch.ch"               
[100] "w.ch.ch"                "r.ch.to"                "w.ch.to"               
[103] "r.ch.pH"                "w.ch.pH"                "r.ch.su"               
[106] "w.ch.su"                "r.ch.al"                "w.ch.al"               
[109] "r.to.fi"                "w.to.fi"                "r.to.vo"               
[112] "w.to.vo"                "r.to.ci"                "w.to.ci"               
[115] "r.to.re"                "w.to.re"                "r.to.ch"               
[118] "w.to.ch"                "r.to.to"                "w.to.to"               
[121] "r.to.pH"                "w.to.pH"                "r.to.su"               
[124] "w.to.su"                "r.to.al"                "w.to.al"               
[127] "r.pH.fi"                "w.pH.fi"                "r.pH.vo"               
[130] "w.pH.vo"                "r.pH.ci"                "w.pH.ci"               
[133] "r.pH.re"                "w.pH.re"                "r.pH.ch"               
[136] "w.pH.ch"                "r.pH.to"                "w.pH.to"               
[139] "r.pH.pH"                "w.pH.pH"                "r.pH.su"               
[142] "w.pH.su"                "r.pH.al"                "w.pH.al"               
[145] "r.su.fi"                "w.su.fi"                "r.su.vo"               
[148] "w.su.vo"                "r.su.ci"                "w.su.ci"               
[151] "r.su.re"                "w.su.re"                "r.su.ch"               
[154] "w.su.ch"                "r.su.to"                "w.su.to"               
[157] "r.su.pH"                "w.su.pH"                "r.su.su"               
[160] "w.su.su"                "r.su.al"                "w.su.al"               
[163] "r.al.fi"                "w.al.fi"                "r.al.vo"               
[166] "w.al.vo"                "r.al.ci"                "w.al.ci"               
[169] "r.al.re"                "w.al.re"                "r.al.ch"               
[172] "w.al.ch"                "r.al.to"                "w.al.to"               
[175] "r.al.pH"                "w.al.pH"                "r.al.su"               
[178] "w.al.su"                "r.al.al"                "w.al.al"               

Now we’re ready to fit the data with a model. We no longer need the prior weights because the red and white wines now live in different dimensions of the data.

dataX.nonlinear.test <- dataX.nonlinear.final[c(test.indices),]
dataX.nonlinear.train <- dataX.nonlinear.final[-c(test.indices),]
set.seed(123)
model.nonlinear <- plsRglm(dataY.train,dataX.nonlinear.train,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.nonlinear.train,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
____************************************************____
Only naive DoF can be used with PLS GLM

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
____Component____ 2 ____
____Component____ 3 ____
____Component____ 4 ____
____Component____ 5 ____
____Component____ 6 ____
____Component____ 7 ____
____Component____ 8 ____
____Component____ 9 ____
____Component____ 10 ____
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
model.nonlinear
Number of required components:
[1] 10
Number of successfully computed components:
[1] 10
Coefficients:
                               [,1]
3|4                    -6.333222432
4|5                    -4.132107335
5|6                    -0.799325688
6|7                     1.980787719
7|8                     4.408132272
8|9                     8.196017259
r.fixed.acidity         0.087981368
w.fixed.acidity        -0.179018943
r.volatile.acidity     -0.148377839
w.volatile.acidity     -0.624278385
r.citric.acid          -0.092287058
w.citric.acid          -0.127517388
r.residual.sugar        0.025302122
w.residual.sugar       -0.124125326
r.chlorides            -0.317357554
w.chlorides            -0.372961471
r.total.sulfur.dioxide  0.064353690
w.total.sulfur.dioxide -0.123273324
r.pH                    0.151860360
w.pH                   -0.187626548
r.sulphates            -0.002557739
w.sulphates             0.201115952
r.alcohol              -0.638127013
w.alcohol              -0.849652109
r.fi.fi                 0.065045008
w.fi.fi                 0.058471632
r.fi.vo                 0.021614149
w.fi.vo                 0.036738361
r.fi.ci                -0.048184932
w.fi.ci                -0.048451098
r.fi.re                 0.039931148
w.fi.re                 0.025571911
r.fi.ch                 0.010439775
w.fi.ch                 0.069856870
r.fi.to                -0.055362535
w.fi.to                -0.013656689
r.fi.pH                 0.001188566
w.fi.pH                 0.009732565
r.fi.su                -0.071177177
w.fi.su                 0.002568474
r.fi.al                -0.042789370
w.fi.al                -0.056924576
r.vo.fi                 0.021614149
w.vo.fi                 0.036738361
r.vo.vo                 0.222498273
w.vo.vo                -0.042069757
r.vo.ci                 0.086877296
w.vo.ci                 0.029137651
r.vo.re                -0.074735306
w.vo.re                 0.024580259
r.vo.ch                 0.011925799
w.vo.ch                 0.036417462
r.vo.to                 0.082336379
w.vo.to                -0.011543873
r.vo.pH                -0.012417896
w.vo.pH                 0.118172799
r.vo.su                 0.040387060
w.vo.su                 0.022518414
r.vo.al                 0.051507251
w.vo.al                 0.122956814
r.ci.fi                -0.048184932
w.ci.fi                -0.048451098
r.ci.vo                 0.086877296
w.ci.vo                 0.029137651
r.ci.ci                -0.020768860
w.ci.ci                 0.127809781
r.ci.re                -0.072725065
w.ci.re                -0.004554723
r.ci.ch                -0.108001145
w.ci.ch                 0.051268352
r.ci.to                -0.031498123
w.ci.to                -0.039588376
r.ci.pH                 0.070957543
w.ci.pH                -0.055047190
r.ci.su                 0.019198704
w.ci.su                -0.006987056
r.ci.al                -0.016673574
w.ci.al                -0.022230763
r.re.fi                 0.039931148
w.re.fi                 0.025571911
r.re.vo                -0.074735306
w.re.vo                 0.024580259
r.re.ci                -0.072725065
w.re.ci                -0.004554723
r.re.re                -0.200936351
w.re.re                 0.120659513
r.re.ch                 0.128565006
w.re.ch                -0.071996816
r.re.to                 0.043325701
w.re.to                -0.104736977
r.re.pH                -0.059934973
w.re.pH                 0.081706213
r.re.su                -0.167865221
w.re.su                -0.025795519
r.re.al                 0.137454766
w.re.al                 0.030472565
r.ch.fi                 0.010439775
w.ch.fi                 0.069856870
r.ch.vo                 0.011925799
w.ch.vo                 0.036417462
r.ch.ci                -0.108001145
w.ch.ci                 0.051268352
r.ch.re                 0.128565006
w.ch.re                -0.071996816
r.ch.ch                -0.046473426
w.ch.ch                 0.030619181
r.ch.to                -0.030339355
w.ch.to                -0.010720915
r.ch.pH                 0.006896270
w.ch.pH                -0.134271137
r.ch.su                -0.056256025
w.ch.su                 0.043759486
r.ch.al                 0.061785768
w.ch.al                -0.062070633
r.to.fi                -0.055362535
w.to.fi                -0.013656689
r.to.vo                 0.082336379
w.to.vo                -0.011543873
r.to.ci                -0.031498123
w.to.ci                -0.039588376
r.to.re                 0.043325701
w.to.re                -0.104736977
r.to.ch                -0.030339355
w.to.ch                -0.010720915
r.to.to                 0.025907962
w.to.to                 0.339218377
r.to.pH                -0.072797537
w.to.pH                 0.060090246
r.to.su                -0.236226188
w.to.su                -0.097249494
r.to.al                -0.012937470
w.to.al                -0.086769181
r.pH.fi                 0.001188566
w.pH.fi                 0.009732565
r.pH.vo                -0.012417896
w.pH.vo                 0.118172799
r.pH.ci                 0.070957543
w.pH.ci                -0.055047190
r.pH.re                -0.059934973
w.pH.re                 0.081706213
r.pH.ch                 0.006896270
w.pH.ch                -0.134271137
r.pH.to                -0.072797537
w.pH.to                 0.060090246
r.pH.pH                 0.062837816
w.pH.pH                -0.022921008
r.pH.su                 0.093873449
w.pH.su                 0.052723798
r.pH.al                 0.021526204
w.pH.al                 0.003771691
r.su.fi                -0.071177177
w.su.fi                 0.002568474
r.su.vo                 0.040387060
w.su.vo                 0.022518414
r.su.ci                 0.019198704
w.su.ci                -0.006987056
r.su.re                -0.167865221
w.su.re                -0.025795519
r.su.ch                -0.056256025
w.su.ch                 0.043759486
r.su.to                -0.236226188
w.su.to                -0.097249494
r.su.pH                 0.093873449
w.su.pH                 0.052723798
r.su.su                 0.078836391
w.su.su                -0.011212834
r.su.al                 0.108341020
w.su.al                -0.061325030
r.al.fi                -0.042789370
w.al.fi                -0.056924576
r.al.vo                 0.051507251
w.al.vo                 0.122956814
r.al.ci                -0.016673574
w.al.ci                -0.022230763
r.al.re                 0.137454766
w.al.re                 0.030472565
r.al.ch                 0.061785768
w.al.ch                -0.062070633
r.al.to                -0.012937470
w.al.to                -0.086769181
r.al.pH                 0.021526204
w.al.pH                 0.003771691
r.al.su                 0.108341020
w.al.su                -0.061325030
r.al.al                 0.015122485
w.al.al                -0.044865085
Information criteria and Fit statistics:
                AIC      BIC Missclassed Chi2_Pearson_Y
Nb_Comp_0  13215.08 13254.41        2919       6661.055
Nb_Comp_1  12167.07 12212.96        2703       5213.714
Nb_Comp_2  11666.04 11718.49        2472       4878.549
Nb_Comp_3  11314.49 11373.49        2403       4612.090
Nb_Comp_4  11185.21 11250.76        2347       4510.452
Nb_Comp_5  11135.51 11207.62        2324       4458.546
Nb_Comp_6  11030.83 11109.50        2306       4361.782
Nb_Comp_7  10970.11 11055.33        2300       4316.858
Nb_Comp_8  10933.24 11025.02        2296       4287.648
Nb_Comp_9  10908.99 11007.32        2280       4270.720
Nb_Comp_10 10887.99 10992.88        2271       4248.284

Now let’s see how well it did.

predictions.train <- as.numeric(predict(model.nonlinear$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.nonlinear, newdata = dataX.nonlinear.test, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
[1] 0.4824865
mae(observations.test, predictions.test)
[1] 0.4938462

The MAE has improved slightly, by about 1%. Let’s try this again, but restricted to the variables which showed the highest correlation with quality.

for (feature in names(dataX.nonlinear.train)) {
  feature.cor <- cor(dataX.nonlinear.train[feature], as.numeric(dataY.train))
  if (abs(feature.cor) > 0.2) {
    print(feature)
    print(feature.cor)
  }
}
[1] "w.chlorides"
                 [,1]
w.chlorides 0.2643897
[1] "r.alcohol"
               [,1]
r.alcohol 0.2151227
[1] "w.alcohol"
               [,1]
w.alcohol 0.3800927
[1] "w.fi.al"
             [,1]
w.fi.al 0.2177318
[1] "w.ch.al"
             [,1]
w.ch.al 0.2441753
[1] "w.to.al"
            [,1]
w.to.al 0.204873
[1] "w.al.fi"
             [,1]
w.al.fi 0.2177318
[1] "w.al.ch"
             [,1]
w.al.ch 0.2441753
[1] "w.al.to"
            [,1]
w.al.to 0.204873
[1] "w.al.al"
             [,1]
w.al.al 0.2308361

These are the best correlations we see. They are very low and I don’t believe restricting to these features will be useful. Plus, they are all white wine features, with no information on red wine.

This method could probably be improved further by engineering cubic and quartic terms, but the number of features would increase correspondingly and it would take much longer to run. However, we can drastically reduce the number of features by not including cross terms. The variables appear to be mostly independent anyway, so we probably won’t get much more information from xy than we would from x and y separately. What we really need are x^2, x^3, and x^4. Let’s try it.

order = 4
dataX.nonlinear <- data.frame(dataX)
for (feature in names(dataX.nonlinear[1:9])) {
  for (power in 2:order) {
    new.feature <- paste(substring(feature,1,2), as.character(power), sep=".")
    dataX.nonlinear[new.feature] <- dataX.nonlinear[feature]^power
  }
}

  # Separate the red and white wine features
  for (feature in names(dataX.nonlinear[c(1:9, 11:length(dataX.nonlinear))])) {
    r.feature <- paste("r", feature, sep=".")
    w.feature <- paste("w", feature, sep=".")
    dataX.nonlinear[r.feature] <- with(dataX.nonlinear, ifelse(color == 1, eval(parse(text=feature)), 0.0))
    dataX.nonlinear[w.feature] <- with(dataX.nonlinear, ifelse(color == 2, eval(parse(text=feature)), 0.0))
  }

  # Remove all the columns that don't start with r. or w.
  n.mixed.color.features <- 10 + 9*(order-1)
  start.index <- n.mixed.color.features + 1
  end.index <- length(dataX.nonlinear)
  dataX.nonlinear.final <- dataX.nonlinear[c(start.index:end.index)]

Now let’s apply the model.

dataX.nonlinear.test <- dataX.nonlinear.final[c(test.indices),]
dataX.nonlinear.train <- dataX.nonlinear.final[-c(test.indices),]
set.seed(123)
model.nonlinear <- plsRglm(dataY.train,dataX.nonlinear.train,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.nonlinear.train,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)

Let’s see how it performed.

predictions.train <- as.numeric(predict(model.nonlinear$FinalModel))
There were 24 warnings (use warnings() to see them)
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.nonlinear, newdata = dataX.nonlinear.test, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
[1] 0.4824865
mae(observations.test, predictions.test)
[1] 0.4938462

This method was just as good as the cross-terms method, but simpler.

Binning into extra dimensions

Let’s try another approach. We can bin the features, and let every bin become its own feature. This will create a new dimension, and thus a different fitted slope, for every bin, thus allowing nonlinear features to be described by the linear model.

We now transform the data points row by row into a more sparse dataset of binned features.

dataX.expanded <- data.frame(dataX)
There were 24 warnings (use warnings() to see them)
nbins.default <- 20
nbins <- nbins.default
for (feature in names(dataX.expanded)) {
  if (feature == "color") {
    nbins <- 2
  } else {
    nbins <- nbins.default
  }
  # map value of feature to bin number
  xmin <- 0.99*min(dataX.expanded[feature])
  xmax <- 1.01*max(dataX.expanded[feature])
  slope <- nbins/(xmax - xmin)
  feature.bin <- paste(feature, "bin", sep="")
  dataX.expanded[feature.bin] <- with(dataX.expanded, floor( slope*( eval(parse(text=feature)) - xmin) ) )
  
  # create new feature for each bin
  for (i in 1:nbins) {
    new.feature <- paste(feature, as.character(i), sep="")
    
    #dataX_expanded[new.feature] <- with(dataX_expanded, 
    #                      ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
    #                              eval(parse(text=feature)) - xmin - eval(parse(text=feature.bin))/slope,
    #                              0.0
    #                       )
    #)
    
    #dataX_expanded[new.feature] <- with(dataX_expanded, 
    #                                    ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
    #                                           eval(parse(text=feature)),
    #                                           0.0
    #                                    )
    #)
    
    dataX.expanded[new.feature] <- with(dataX.expanded, 
                                        ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
                                               1,
                                               0.0
                                        )
    )
  }
}
dataX.expanded

We then remove all the extra columns and just keep x1, x2, …, x10 for each feature x.

dataX.binned <- dataX.expanded[,grepl(regex("\\d$"),names(dataX.expanded))]
dataX.binned

We also need to remove the columns that contain only zeroes, because they cause problems in the application of the model.

means <- sapply(dataX.binned, mean)
dataX.final <- dataX.binned[which(means > 0)]
dataX.final

Now we are ready to fit the data with the plsRglm model.

set.seed(123)
model.pls.expanded <- plsRglm(dataY,dataX.final,nt=10,dataPredictY=dataX.final,modele="pls-glm-polr")
____************************************************____

Model: pls-glm-polr 
Method: logistic 

____Component____ 1 ____
glm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 2 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 3 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 4 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 5 ____
glm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 6 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 7 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 8 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 9 ____
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
____Component____ 10 ____
____Predicting X without NA neither in X nor in Y____
****________________________________________________****
model.pls.expanded
Number of required components:
[1] 10
Number of successfully computed components:
[1] 10
Coefficients:
                                [,1]
3|4                    -6.350318e+00
4|5                    -4.111657e+00
5|6                    -8.516516e-01
6|7                     1.886597e+00
7|8                     4.307041e+00
8|9                     8.101431e+00
fixed.acidity1         -3.659454e-01
fixed.acidity2         -1.852644e-01
fixed.acidity3         -9.924809e-02
fixed.acidity4          5.780530e-02
fixed.acidity5          2.991524e-01
fixed.acidity6          2.368322e-01
fixed.acidity7          1.498036e-01
fixed.acidity8         -1.823345e-01
fixed.acidity9         -4.436807e-02
fixed.acidity10         1.913853e-02
fixed.acidity11         5.370343e-02
fixed.acidity12        -3.228410e-02
fixed.acidity13        -7.171017e-01
fixed.acidity14        -5.712047e-01
fixed.acidity15         7.118439e-02
fixed.acidity16         2.580047e+00
fixed.acidity17        -1.892456e+00
fixed.acidity18         3.751195e+00
fixed.acidity20        -2.082935e+00
volatile.acidity1       3.496774e+00
volatile.acidity2       2.547762e+00
volatile.acidity3       1.268516e+00
volatile.acidity4       1.131194e+00
volatile.acidity5       6.828435e-01
volatile.acidity6       4.043778e-01
volatile.acidity7       1.208307e-01
volatile.acidity8       1.003904e-01
volatile.acidity9      -1.721623e-01
volatile.acidity10     -5.940907e-01
volatile.acidity11     -7.133149e-01
volatile.acidity12     -1.002180e+00
volatile.acidity13     -1.197317e+00
volatile.acidity14     -8.757141e-01
volatile.acidity15     -1.294720e+00
volatile.acidity16     -1.393338e+00
volatile.acidity17     -2.023239e+00
volatile.acidity18     -2.156788e+00
volatile.acidity19      1.213499e-01
volatile.acidity20     -1.172148e+00
citric.acid1           -2.774812e-02
citric.acid2            4.897177e-02
citric.acid3            1.230806e-01
citric.acid4            1.821481e-01
citric.acid5            1.544738e-01
citric.acid6           -1.970135e-01
citric.acid7           -3.082459e-01
citric.acid8           -7.245165e-02
citric.acid9            3.076393e-02
citric.acid10          -2.634582e-02
citric.acid11           3.091745e-01
citric.acid12           2.400045e-01
citric.acid13           5.542724e-01
citric.acid14           1.487787e+00
citric.acid15          -2.139893e-01
citric.acid17           1.984131e+00
citric.acid20           2.028353e+00
residual.sugar1         3.368354e+00
residual.sugar2         1.189364e+00
residual.sugar3         5.363889e-01
residual.sugar4         6.380188e-01
residual.sugar5         2.594023e-01
residual.sugar6         1.452022e-01
residual.sugar7        -1.338478e-01
residual.sugar8        -1.841445e-02
residual.sugar9         4.500865e-02
residual.sugar10       -1.148067e-01
residual.sugar11       -8.152778e-02
residual.sugar12        3.035100e-03
residual.sugar13       -1.052126e-01
residual.sugar14       -1.750032e-01
residual.sugar15       -2.509328e-01
residual.sugar16       -7.717315e-01
residual.sugar17       -2.773786e-01
residual.sugar18        2.341719e-01
residual.sugar19        5.275473e-01
residual.sugar20       -2.152196e+00
chlorides1             -2.804455e-02
chlorides2              2.665235e-01
chlorides3             -8.928025e-03
chlorides4              3.452413e-01
chlorides5              1.566659e-01
chlorides6              1.317020e-01
chlorides7              1.728296e-01
chlorides8              1.030822e-02
chlorides9             -3.677502e-01
chlorides10            -3.119672e-01
chlorides11            -4.817883e-01
chlorides12            -1.916585e-01
chlorides13            -8.166339e-01
chlorides14            -3.733069e-01
chlorides15            -8.657187e-02
chlorides16            -9.441328e-01
chlorides17            -9.030765e-01
chlorides20             3.292669e+00
total.sulfur.dioxide1  -7.264231e-02
total.sulfur.dioxide2  -1.867667e-01
total.sulfur.dioxide3  -7.833898e-02
total.sulfur.dioxide4  -2.573164e-01
total.sulfur.dioxide5   2.612766e-02
total.sulfur.dioxide6   3.753306e-01
total.sulfur.dioxide7   3.293630e-01
total.sulfur.dioxide8  -6.766164e-02
total.sulfur.dioxide9  -2.312411e-03
total.sulfur.dioxide10 -1.388713e-01
total.sulfur.dioxide11 -1.909280e-01
total.sulfur.dioxide12  9.555384e-02
total.sulfur.dioxide13  8.165815e-02
total.sulfur.dioxide14  1.145414e-01
total.sulfur.dioxide15  2.690875e-01
total.sulfur.dioxide16 -2.616085e-01
total.sulfur.dioxide17  3.036307e+00
total.sulfur.dioxide18  4.025632e+00
total.sulfur.dioxide20  3.059913e+03
pH1                    -4.038044e-01
pH2                     9.848509e-01
pH3                    -4.792703e-02
pH4                    -5.278114e-02
pH5                     9.641745e-02
pH6                     8.301635e-02
pH7                    -2.241533e-02
pH8                     8.445422e-02
pH9                    -9.786441e-02
pH10                   -1.624191e-01
pH11                   -1.946385e-01
pH12                   -3.027051e-02
pH13                   -3.813705e-02
pH14                    6.869872e-01
pH15                    8.675343e-01
pH16                    1.118431e+00
pH17                   -3.037108e-01
pH18                    8.894154e-01
pH19                    2.824933e+00
pH20                    4.143929e-01
sulphates1             -9.389119e-02
sulphates2             -3.347355e-01
sulphates3             -8.018650e-01
sulphates4             -6.843723e-01
sulphates5             -3.989538e-01
sulphates6             -2.779803e-01
sulphates7              9.201228e-02
sulphates8              1.498525e-01
sulphates9              1.539153e-01
sulphates10             2.130380e-01
sulphates11             2.533586e-01
sulphates12             5.634949e-01
sulphates13             5.124379e-02
sulphates14             2.007053e-01
sulphates15             4.981493e-01
sulphates16             8.724902e-01
sulphates17            -1.690097e+00
sulphates18            -4.680026e-01
sulphates19             1.122228e+00
sulphates20             1.029151e+00
alcohol2                1.175174e+00
alcohol3                7.626281e-01
alcohol4                1.087288e+00
alcohol5                1.003283e+00
alcohol6                5.763800e-01
alcohol7                3.208345e-01
alcohol8                1.416689e-01
alcohol9               -3.671026e-01
alcohol10              -4.952422e-01
alcohol11              -1.060599e+00
alcohol12              -1.342865e+00
alcohol13              -1.591216e+00
alcohol14              -2.147514e+00
alcohol15              -2.223824e+00
alcohol16              -2.571881e+00
alcohol17              -2.105089e+00
alcohol18              -2.775190e+00
alcohol20               2.771932e+00
color1                 -2.900309e-01
color2                  2.900298e-01
Information criteria and Fit statistics:
                AIC      BIC Missclassed Chi2_Pearson_Y
Nb_Comp_0  16561.19 16601.86        3661       8388.609
Nb_Comp_1  14709.97 14757.43        3162       6307.853
Nb_Comp_2  14188.93 14243.16        2995       5837.341
Nb_Comp_3  13936.33 13997.34        2937       5607.245
Nb_Comp_4  13820.91 13888.70        2891       5518.015
Nb_Comp_5  13782.64 13857.21        2870       5478.408
Nb_Comp_6  13765.23 13846.58        2867       5458.309
Nb_Comp_7  13761.00 13849.12        2880       5449.549
Nb_Comp_8  13759.36 13854.26        2875       5442.879
Nb_Comp_9  13760.46 13862.14        2869       5441.112
Nb_Comp_10 13762.72 13871.18        2870       5439.682

Next, we evaluate the model’s performance like before.

predictions <- as.numeric(predict(model.pls.expanded$FinalModel))
observations <- as.numeric(dataY)
table(as.numeric(unlist(dataY)), predictions )
   predictions
       1    2    3    4    5    6
  1    1    1   18   10    0    0
  2    0    0  140   72    4    0
  3    0    3 1330  790   15    0
  4    0    0  619 1999  216    1
  5    0    0   33  749  295    2
  6    0    0    1  122   68    2
  7    0    0    0    1    4    0
plot(observations, predictions)

mae(observations, predictions)
[1] 0.4873768

This is the best MAE that I’ve seen so far, but I don’t think it’s good enough to justify the amount of effort we put into it and the long time it took to run it. It will also be difficult to expand the model to incorporate new data, since we removed many columns of zeros before fitting.

Conclusion

In conclusion, I would say that the most accurate and robust method that I have found is that of multiplying columns together to effectively turn the plsRglm model into a polynomial model. This method performed better than a simple plsRglm model and can easily be applied to new data. I implemented the polynomial model in the script AFS_wine_quality_model_function.R. At this point, no ensemble method is applied.

---
title: "AFS wine quality project"
author: "Jared Jay"
output: html_notebook
---

## Introduction

This project attempts to fulfill the assignment outlined in https://www.gastrograph.com/blogs/interviewing-data-science-interns.

"Using only partial least squares regression for generalized linear models (the plsRglm package in R), build an ensemble model to predict the quality score given to each wine from the Vinho Verde region of Portugal (see the data bullet in the Requirements section below to download the datasets). The data consists of chemical tests on wines and an overall assessment score averaged from at least three professional wine tasters. This is interesting data for AFS as the lack of consistent preferences among professional tasters is one of the reasons our company exists."

"The rubric for assessment is explained in the Selection Criteria section below. The prediction model should be trained and will be tested on both red and white wine after joining the two datasets. We will split the supplied data into an 80% training set and a 20% hold-out validation set before running your training script with a random seed. We will use mean absolute error as a performance metric."


## Load necessary libraries

Several packages needed to be installed in order to use plsRglm. Other packages are loaded for data exploration, tidying, and analysis. The install commands are commented out rather than removed in case they need to be re-installed on another machine.

```{r}
#install.packages("mvtnorm")
#install.packages("bipartite")

# Package 'nloptr' must be installed like this: 
# Download https://github.com/stevengj/nlopt/archive/v2.7.1.tar.gz
# Extract contents, then enter the directory and type the following command:
# cmake . && make && sudo make install
# next type this command:
# sudo apt install libnlopt-dev
# Then you can do this:
#install.packages("nloptr")

# Package 'nloptr' is a prerequisite for 'car'
#install.packages("car")

# Packages 'mvtnorm', 'bipartite', and 'car' are prerequisites for 'plsRglm'.
#install.packages("plsRglm")
library("plsRglm") # See https://arxiv.org/abs/1810.01005 

#install.packages("tidyverse")
library("tidyverse") # for tidying up the data set

#install.packages("ggplot2")
library("ggplot2") # for visualizations
#install.packages("GGally")
library("GGally")

#install.packages('plsdof')
library('plsdof') # Necessary for "plsRglm" to work

#install.packages("moments")
library('moments') # For calculating skewness of distributions

#install.packages("Metrics")
library("Metrics") # For calculating MAE

library("stringr") # For regex
```

## Read in the data

The data are stored in two files. One contains information on red wine and the other on white wine. In both cases, values are separated by semicolons. We will read in both files, add a variable describing the color of the wine, and then merge them into one dataframe.

```{r}
# read csv's
df.red <- read.csv("data/winequality-red.csv", sep=';')
df.white <- read.csv("data/winequality-white.csv", sep=';')

# add categorical color variables to both sets
df.red['color'] <- 'red'
df.white['color'] <- 'white'

# merge datasets
df <- rbind(df.white, df.red)
```

The goal is to use these input variables:

   1 - fixed.acidity

   2 - volatile.acidity

   3 - citric.acid

   4 - residual.sugar

   5 - chlorides

   6 - free.sulfur.dioxide

   7 - total.sulfur.dioxide

   8 - density

   9 - pH

   10 - sulphates

   11 - alcohol

   13 - color

To predict this output variable: 

   12 - quality (score between 0 and 10)


## Explore the data

Let's print out a few things to get an idea of what the data look like:
```{r}
str(df)
```

```{r}
head(df)
```

```{r}
tail(df)
```

```{r}
summary(df)
```

Let's now start exploring the data one variable at a time. I've created a function below that will make a plot of two overlapping transparent histograms. It will be useful to compare the distributions for red wine and white wine.
```{r}
plot_multi_histogram <- function(df, feature, label.column, alpha, binwidth) {
  plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label.column)))) +
    geom_histogram(alpha=alpha, position="identity", color="black", binwidth=binwidth) +
  #geom_density(alpha=alpha)# +
  #geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1)# +
  labs(x=feature, y = "Density")
  plt + guides(fill=guide_legend(title=label.column))
}
```

### fixed.acidity
```{r}
#   1 - fixed acidity
summary(df$fixed.acidity)
#table(df$fixed.acidity)
#sort(unique(df$fixed.acidity))
```
```{r}
#ggplot(data = df, aes(x=fixed.acidity, fill=color)) + geom_histogram(alpha=0.2, position="identity")
plot_multi_histogram(df, "fixed.acidity", "color", alpha=0.4, binwidth=1)
```
### volatile.acidity
```{r}
#   2 - volatile acidity
summary(df$volatile.acidity)
```
```{r}
plot_multi_histogram(df, "volatile.acidity", "color", alpha=0.4, binwidth=0.1)
```

### citric.acid
```{r}
#   3 - citric acid
summary(df$citric.acid)
```
```{r}
plot_multi_histogram(df, "citric.acid", "color", alpha=0.4, binwidth=0.1)
```

### residual.sugar
```{r}
#   4 - residual sugar
summary(df$residual.sugar)
```
```{r}
plot_multi_histogram(df, "residual.sugar", "color", alpha=0.4, binwidth=2)
```

### chlorides
```{r}
#   5 - chlorides
summary(df$chlorides)
```
```{r}
plot_multi_histogram(df, "chlorides", "color", alpha=0.4, binwidth=0.01)
```

### free.sulfur.dioxide
```{r}
#   6 - free sulfur dioxide
summary(df$free.sulfur.dioxide)
```
```{r}
plot_multi_histogram(df, "free.sulfur.dioxide", "color", alpha=0.4, binwidth=10)
```

### total.sulfur.dioxide
```{r}
#   7 - total sulfur dioxide
summary(df$total.sulfur.dioxide)
```
```{r}
plot_multi_histogram(df, "total.sulfur.dioxide", "color", alpha=0.4, binwidth=10)
```

### density
```{r}
#   8 - density
summary(df$density)
```
```{r}
plot_multi_histogram(df, "density", "color", alpha=0.4, binwidth=0.001)
```

### pH
```{r}
#   9 - pH
summary(df$pH)
```
```{r}
plot_multi_histogram(df, "pH", "color", alpha=0.4, binwidth=0.05)
```

### sulphates
```{r}
#   10 - sulphates
summary(df$sulphates)
```
```{r}
plot_multi_histogram(df, "sulphates", "color", alpha=0.4, binwidth=0.1)
```

### alcohol
```{r}
#   11 - alcohol
summary(df$alcohol)
```
```{r}
plot_multi_histogram(df, "alcohol", "color", alpha=0.4, binwidth=0.2)
```

### quality
```{r}
#   12 - quality (score between 0 and 10)
summary(df$quality)
```
```{r}
plot_multi_histogram(df, "quality", "color", alpha=0.4, binwidth=1)
```

By looking at the data, we have learned the following:

There is more data on white wine than red wine, so we may need to rescale the data so that the model learns to classify red wine as well as white wine.

Several of the distributions are skewed, indicating the presence of long tails or outliers.

Now let's look for correlations between features in the data with a ggpairs plot.
```{r}
ggpairs(df, columns=1:12, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))
```

The highest correlations we see are the following:

0.721 between free.sulfur.dioxide and total.sulfur.dioxide

0.687 between density and alcohol

0.553 between density and residual.sugar

0.495 between total.sulfur.dioxide and residual.sugar

0.459 between density and fixed.acidity

0.444 between alcohol and quality

0.414 between total.sulfur.dioxide and volatile.acidity

0.403 between free.sulfur.dioxide and residual.sugar

There is also a correlation between pH and volatile.acidity, but it's different between red wine and white wine.

Cleary the alcohol content will be useful for our model because of its relatively high correlation with the wine quality rating (0.444).

We also see some outliers. Let's take a closer look at some of the plots to figure out where these outliers are and if they ought to be removed. The most obvious outlier is the one in residual.sugar and density.
```{r}
ggplot(df, aes(x=residual.sugar, y=density, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")
```

The maximum has over twice as much sugar as the next largest point. It is also twice as far from the minimum density as the next most dense point. It is clearly an outlier and I believe it is safe to remove it from the data. We'll do this by cutting out all points with residual.sugar > 40. The next largest point is also separated from most of the other points, but more so in density than in residual.sugar. The density is correlated with both alcohol (0.687) and residual.sugar (0.553). I think we can assume that much of the information about the density is included already in the alcohol and sugar variables, so we can remove density from the dataset.

Another outlier is in free.sulfur.dioxide and total.sulfur.dioxide.
```{r}
ggplot(df, aes(x=free.sulfur.dioxide, y=total.sulfur.dioxide, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")
```

This point has almost twice as much free sulfur dioxide as the next largest point. However, it is not much of an outlier on the other axis, total.sulfur.dioxide. It also has the minimum quality rating, which means it could have useful predictive power. There is a strong correlation between these two variables (0.721), which means it may be better to simply remove the variable free.sulfur.dioxide from the dataset rather than remove this data point.

Let's apply the changes (remove sugar outlier & remove free.sulfur.dioxide (6) and density (8)).
```{r}
df.cut <- df[df$residual.sugar < 40, c(1:5, 7, 9:13)]
```

Now we'll redraw the plots.
```{r}
ggpairs(df.cut, columns=1:10, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))
```

There are still some points that tend to isolate themselves from most of the data, but not to such a degree as the outlier that we removed. They seem to simply follow the shape of a long-tailed distribution. This calls for a transformation of the data to make the distributions more normal.

Let's begin the transformations by first calculating the skewness of the variables.
```{r}
for (name in names(df.cut[1:9])) {
  skew <- skewness(df.cut[name], na.rm = TRUE)
  print(paste(name,skew))
}
```

We now create a copy of the dataframe and apply transformations. The functions 1/x, sqrt, and log10 are useful for transforming positively skewed distributions into a more gaussian shape, depending on the severity of the skew. The following transformations give decent distributions.
```{r}
df.transformed <- data.frame(df.cut)
df.transformed$fixed.acidity <- 1/(df.transformed$fixed.acidity)
df.transformed$sulphates <- 1/(df.transformed$sulphates)
df.transformed$volatile.acidity <- sqrt(1/(df.transformed$volatile.acidity))
df.transformed$residual.sugar <- log10(df.transformed$residual.sugar)
df.transformed$chlorides <- sqrt(1/(df.transformed$chlorides))
df.transformed$citric.acid <- sqrt(df.transformed$citric.acid + 0.1)
df.transformed$total.sulfur.dioxide <- sqrt(df.transformed$total.sulfur.dioxide)
```

Let's plot a few histograms to see how normal the new distributions are.
```{r}
plot_multi_histogram(df.transformed, "volatile.acidity", "color", alpha=0.4, binwidth=0.1)
```
```{r}
plot_multi_histogram(df.transformed, "residual.sugar", "color", alpha=0.4, binwidth=0.1)
```
```{r}
plot_multi_histogram(df.transformed, "chlorides", "color", alpha=0.4, binwidth=2)
```
```{r}
plot_multi_histogram(df.transformed, "citric.acid", "color", alpha=0.4, binwidth=0.05)
```

We can now calculate the skewness again to see if it has improved.
```{r}
for (name in names(df.transformed[1:9])) {
  skew <- skewness(df.transformed[name], na.rm = TRUE)
  print(paste(name,skew))
}
```

We can also plot a few scatter plots to look at two variables at once, to see how the distributions changed after the transformations and look for any more correlations. We can color-code the quality score.
```{r}
ggplot(df.cut, aes(x=fixed.acidity, y=sulphates, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")
```
```{r}
ggplot(df.transformed, aes(x=fixed.acidity, y=sulphates, color=as.factor(quality))) +
  geom_point(size=6) + scale_color_brewer(palette = "Spectral")
```

In this particular case it looks like the data have become more normally distributed. Let's look at the full correlation plot again because it gives a good view of the entire dataset all at once.
```{r}
ggpairs(df.transformed, columns=1:10, mapping=ggplot2::aes(colour = color), lower=list(continuous='points')) + scale_color_manual(values=c("red", "yellow"))
```

The data look very good now. However, there does not seem to be any easy way to predict the quality of the wine. None of the variables have a very high correlation with quality. The colored scatter plots that we have looked at do not have any obvious clustering of quality scores. Any significant correlations with quality must reside in combinations of more that two features, in a very nonlinear way. This makes the required application of a linear model unusual, but let's see what we can do.

Let's center and scale the data, and then prepare a model.
```{r}
# center and scale the data
df.transformed.scaled <- data.frame(df.transformed)
for (feature in names(df.transformed.scaled[1:9])) {
  df.transformed.scaled[feature] <- scale(df.transformed.scaled[,feature], center = TRUE, scale = TRUE)
}
head(df.transformed.scaled)
```

## First model attempt

The data appear to be ready for the application of a simple test model to see if we can get the plsRglm package working correctly. Let's transform the color variable into numeric factors and set up our X variables and Y target.
```{r}
dataY <- factor(df.transformed.scaled$quality, ordered=TRUE)
dataX <- df.transformed.scaled[1:9]
df.transformed.scaled$color <- factor(df.transformed.scaled$color)
dataX['color'] <- as.numeric(df.transformed.scaled$color)
```

As stated previously, the data ought to be weighted by color so that the model isn't biased towards white wine. We want it to learn to accurately predict the quality of either color. In this case, the difference in size of the two groups is not extreme, but it is still good practice to be aware of and take care of these issues.
```{r}
weight.factor <- nrow(df.red)/nrow(df.white)
get_weight <- function(color.name) {
  if (color.name == "white") {
    return(weight.factor)
  }
  return(1.0)
}
prior.weights <- sapply(df.transformed.scaled$color, get_weight)
```

We now build a model using the required package plsRglm. See https://cran.r-project.org/web/packages/plsRglm/vignettes/plsRglm.pdf for more information on this package and specific applications. More information about the arguments can be found at https://www.rdocumentation.org/packages/plsRglm/versions/1.3.0/topics/plsRglm. We will train on 80% of the data and save 20% for testing later.
```{r}
N.rows <- nrow(dataX)
train.fraction <- 0.8
N.train.rows <- floor(N.rows*train.fraction)
N.test.rows <- N.rows - N.train.rows
test.indices <- sample.int(N.rows, N.test.rows) # select N_test_rows randomly
dataX.test <- dataX[c(test.indices),]
dataX.train <- dataX[-c(test.indices),]
dataY.test <- dataY[c(test.indices)]
dataY.train <- dataY[-c(test.indices)]
prior.weights.test <- prior.weights[c(test.indices)]
prior.weights.train <- prior.weights[-c(test.indices)]
```
```{r}
set.seed(123)
model.pls <- plsRglm(dataY.train,dataX.train,nt=10,limQ2set=.0975,
             dataPredictY=dataX.train,modele="pls-glm-polr",family=NULL,typeVC="none",
             EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
             alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train,
             sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
model.pls
```

Let's take a look at how well the model performs in predicting the quality of wines. We compare the predictions and observations with a table, a plot, and a calculation of the Mean Absolute Error (MAE).
```{r}
predictions.train <- as.numeric(predict(model.pls$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.pls, newdata = dataX.test, type="class"))
observations.test <- as.numeric(dataY.test)
table(as.numeric(unlist(dataY.train)), predictions.train )
table(as.numeric(unlist(dataY.test)), predictions.test )
```
```{r}
plot(observations.test, predictions.test)
```
```{r}
mae(observations.train, predictions.train)
mae(observations.test, predictions.test)
```
This is not bad. It's much better than randomly guessing numbers between 3 and 7, in which case I would expect an mae of about 1.8. But still, maybe we can do better.

A cross validation experiment demonstrates that 10 is an appropriate number of components for this model, given the training data. However, this takes a very long time to run. So I'm excluding it from this notebook.
```{r}
#cv.modpls <- cv.plsRglm(dataY.train,dataX.train,nt=10,modele="pls-glm-polr",NK=5)
#cv.modpls
#res.cv.modpls=cvtable(summary(cv.modpls, MClassed = TRUE))
#plot(res.cv.modpls) # This suggests that 10 is the preferred number of components for the model.
```

## Ensemble

Let's try to improve the model by implementing an ensemble. Linear regression models are very robust and tend to get the same result every time, so making the same model again and again and averaging the results won't do any good, unlike a random forest which can create thousands of different trees by simply changing the random see. Instead of bagging models together, let's try a boosting technique. The first model will use the features to try and predict the quality score. The second model will use the features and the 1st model's score to try and predict the true quality score, thus fitting to the difference, and so on.
```{r}
dataX.train.2 <- data.frame(dataX.train)
dataX.train.2["prediction1"] <- as.numeric( predict(model.pls$FinalModel) )
dataX.test.2 <- data.frame(dataX.test)
dataX.test.2["prediction1"] <- as.numeric( predict(model.pls, newdata = dataX.test, type="class") )
model.pls.2 <- plsRglm(dataY.train,dataX.train.2,nt=10,limQ2set=.0975,
                     dataPredictY=dataX.train.2,modele="pls-glm-polr",family=NULL,typeVC="none",
                     EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                     alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train)
model.pls.2
```

We now calculate the MAE to see how well the second model performed.

```{r}
predictions.train <- as.numeric(predict(model.pls.2$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.pls.2, newdata = dataX.test.2, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
mae(observations.test, predictions.test)
```
Boosting in this manner offered no significant improvement in the MAE.

We could try a boosted ensemble where we just do one variable at a time and then fit the prediction error with each successive model, but I think that would be just like multivariate regression, but not as good.

Let's continue trying a boosted ensemble with all of the variables, including the result of the previous model to make the next predictions. But instead of trying to fit the quality values at each step, we can try to fit the difference between the quality values and the previous prediction. Let's try it.
```{r}
number.of.models <- 3
models.pls <- vector(mode = "list", length = number.of.models) # empty list of models
errors <- vector(mode = "list", length = number.of.models) # empty list of errors
predictions <- as.numeric(dataY.train)*0.0
targets <- as.numeric(dataY.train)
for (i in 1:number.of.models) {
  dataX.this <- data.frame(dataX.train)
  if (i > 1) {
    dataX.this["prediction1"] <- predictions
  }
  targets <- targets - predictions
  model.this <- plsRglm(targets,dataX.this,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.this,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights=prior.weights.train,
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
  #model_this
  predictions <- as.numeric(predict(model.this$FinalModel))
  observations <- as.numeric(targets)
  errors[i] <- mae(observations, predictions)
  models.pls[i] <- model.this
  print(errors[i])
}
```
This is resulting in models that get progressively worse. I see now that I was on the wrong track. The data are too noisy, so by boosting a noisy model I just get an even noisier model. What I can do is bag the models instead, which is what I didn't want to do at first because every model would be the same, but actually that's only true if I use the same data each time. I could split up the dataset into N chunks and fit each one separately, and then I could use the results all together for the final prediction. But that's basically just multivariate linear regression on the full dataset, only worse.

## Feature engineering cross-terms for effective polynomial fitting

Let's try another approach. Since the plsRglm model is designed to handle high dimensional data with few rows, I'm going to increase the dimensionality of the dataset. This will allow previously nonlinear quadratic properties to be described linearly by the plsRglm model.

If I think that the relationship between the data and quality is nonlinear then I just need to create nonlinear features. If I want xy in the model instead of just ax + by I can make a column xy by multiplying two columns together. I'll multiply every continuous feature by every other continuous feature to get new quadratic columns.
```{r}
dataX.nonlinear <- data.frame(dataX)
for (feature1 in names(dataX.nonlinear[1:9])) {
  for (feature2 in names(dataX.nonlinear[1:9])) {
    new.feature <- paste(substring(feature1,1,2), substring(feature2,1,2), sep=".")
    dataX.nonlinear[new.feature] <- dataX.nonlinear[feature1]*dataX.nonlinear[feature2]
  }
}
```
```{r}
names(dataX.nonlinear)
```

I should also separate the red and white wine features, because they appear to be significantly different.
```{r}
for (feature in names(dataX.nonlinear[c(1:9, 11:91)])) {
  
  r.feature <- paste("r", feature, sep=".")
  w.feature <- paste("w", feature, sep=".")
  dataX.nonlinear[r.feature] <- with(dataX.nonlinear, 
                                        ifelse(color == 1,
                                               eval(parse(text=feature)),
                                               0.0)
                                        )
  dataX.nonlinear[w.feature] <- with(dataX.nonlinear, 
                                        ifelse(color == 2,
                                               eval(parse(text=feature)),
                                               0.0)
                                        )
}
```

Let's look at the new dataframe features.
```{r}
names(dataX.nonlinear)
```

Now let's remove all the columns that don't start with r. or w.
```{r}
dataX.nonlinear.final <- dataX.nonlinear[c(92:271)]
names(dataX.nonlinear.final)
```

Now we're ready to fit the data with a model. We no longer need the prior weights because the red and white wines now live in different dimensions of the data.
```{r}
dataX.nonlinear.test <- dataX.nonlinear.final[c(test.indices),]
dataX.nonlinear.train <- dataX.nonlinear.final[-c(test.indices),]
set.seed(123)
model.nonlinear <- plsRglm(dataY.train,dataX.nonlinear.train,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.nonlinear.train,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
model.nonlinear
```

Now let's see how well it did.
```{r}
predictions.train <- as.numeric(predict(model.nonlinear$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.nonlinear, newdata = dataX.nonlinear.test, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
mae(observations.test, predictions.test)
```
The MAE has improved slightly, by about 1%. Let's try this again, but restricted to the variables which showed the highest correlation with quality.
```{r}
for (feature in names(dataX.nonlinear.train)) {
  feature.cor <- cor(dataX.nonlinear.train[feature], as.numeric(dataY.train))
  if (abs(feature.cor) > 0.2) {
    print(feature)
    print(feature.cor)
  }
}
```
These are the best correlations we see. They are very low and I don't believe restricting to these features will be useful. Plus, they are all white wine features, with no information on red wine.

This method could probably be improved further by engineering cubic and quartic terms, but the number of features would increase correspondingly and it would take much longer to run. However, we can drastically reduce the number of features by not including cross terms. The variables appear to be mostly independent anyway, so we probably won't get much more information from xy than we would from x and y separately. What we really need are x^2, x^3, and x^4. Let's try it.
```{r}
order = 4
dataX.nonlinear <- data.frame(dataX)
for (feature in names(dataX.nonlinear[1:9])) {
  for (power in 2:order) {
    new.feature <- paste(substring(feature,1,2), as.character(power), sep=".")
    dataX.nonlinear[new.feature] <- dataX.nonlinear[feature]^power
  }
}

  # Separate the red and white wine features
  for (feature in names(dataX.nonlinear[c(1:9, 11:length(dataX.nonlinear))])) {
    r.feature <- paste("r", feature, sep=".")
    w.feature <- paste("w", feature, sep=".")
    dataX.nonlinear[r.feature] <- with(dataX.nonlinear, ifelse(color == 1, eval(parse(text=feature)), 0.0))
    dataX.nonlinear[w.feature] <- with(dataX.nonlinear, ifelse(color == 2, eval(parse(text=feature)), 0.0))
  }

  # Remove all the columns that don't start with r. or w.
  n.mixed.color.features <- 10 + 9*(order-1)
  start.index <- n.mixed.color.features + 1
  end.index <- length(dataX.nonlinear)
  dataX.nonlinear.final <- dataX.nonlinear[c(start.index:end.index)]
```

Now let's apply the model.
```{r}
dataX.nonlinear.test <- dataX.nonlinear.final[c(test.indices),]
dataX.nonlinear.train <- dataX.nonlinear.final[-c(test.indices),]
set.seed(123)
model.nonlinear <- plsRglm(dataY.train,dataX.nonlinear.train,nt=10,limQ2set=.0975,
                       dataPredictY=dataX.nonlinear.train,modele="pls-glm-polr",family=NULL,typeVC="none",
                       EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE,
                       alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),
                       sparse=FALSE,sparseStop=TRUE,naive=FALSE,verbose=TRUE)
```

Let's see how it performed.
```{r}
predictions.train <- as.numeric(predict(model.nonlinear$FinalModel))
observations.train <- as.numeric(dataY.train)
predictions.test <- as.numeric(predict(model.nonlinear, newdata = dataX.nonlinear.test, type="class"))
observations.test <- as.numeric(dataY.test)
mae(observations.train, predictions.train)
mae(observations.test, predictions.test)
```
This method was just as good as the cross-terms method, but simpler.

## Binning into extra dimensions

Let's try another approach. We can bin the features, and let every bin become its own feature. This will create a new dimension, and thus a different fitted slope, for every bin, thus allowing nonlinear features to be described by the linear model.

We now transform the data points row by row into a more sparse dataset of binned features.
```{r}
dataX.expanded <- data.frame(dataX)
nbins.default <- 20
nbins <- nbins.default
for (feature in names(dataX.expanded)) {
  if (feature == "color") {
    nbins <- 2
  } else {
    nbins <- nbins.default
  }
  # map value of feature to bin number
  xmin <- 0.99*min(dataX.expanded[feature])
  xmax <- 1.01*max(dataX.expanded[feature])
  slope <- nbins/(xmax - xmin)
  feature.bin <- paste(feature, "bin", sep="")
  dataX.expanded[feature.bin] <- with(dataX.expanded, floor( slope*( eval(parse(text=feature)) - xmin) ) )
  
  # create new feature for each bin
  for (i in 1:nbins) {
    new.feature <- paste(feature, as.character(i), sep="")
    
    #dataX_expanded[new.feature] <- with(dataX_expanded, 
    #                      ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
    #                              eval(parse(text=feature)) - xmin - eval(parse(text=feature.bin))/slope,
    #                              0.0
    #                       )
    #)
    
    #dataX_expanded[new.feature] <- with(dataX_expanded, 
    #                                    ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
    #                                           eval(parse(text=feature)),
    #                                           0.0
    #                                    )
    #)
    
    dataX.expanded[new.feature] <- with(dataX.expanded, 
                                        ifelse(eval(parse(text=feature)) < i/slope+xmin & eval(parse(text=feature)) > (i-1)/slope+xmin,
                                               1,
                                               0.0
                                        )
    )
  }
}
dataX.expanded
```

We then remove all the extra columns and just keep x1, x2, ..., x10 for each feature x.
```{r}
dataX.binned <- dataX.expanded[,grepl(regex("\\d$"),names(dataX.expanded))]
dataX.binned
```

We also need to remove the columns that contain only zeroes, because they cause problems in the application of the model.
```{r}
means <- sapply(dataX.binned, mean)
dataX.final <- dataX.binned[which(means > 0)]
dataX.final
```

Now we are ready to fit the data with the plsRglm model.
```{r}
set.seed(123)
model.pls.expanded <- plsRglm(dataY,dataX.final,nt=10,dataPredictY=dataX.final,modele="pls-glm-polr")
model.pls.expanded
```

Next, we evaluate the model's performance like before.
```{r}
predictions <- as.numeric(predict(model.pls.expanded$FinalModel))
observations <- as.numeric(dataY)
table(as.numeric(unlist(dataY)), predictions )
```
```{r}
plot(observations, predictions)
```
```{r}
mae(observations, predictions)
```
This is the best MAE that I've seen so far, but I don't think it's good enough to justify the amount of effort we put into it and the long time it took to run it. It will also be difficult to expand the model to incorporate new data, since we removed many columns of zeros before fitting.

## Conclusion

In conclusion, I would say that the most accurate and robust method that I have found is that of multiplying columns together to effectively turn the plsRglm model into a polynomial model. This method performed better than a simple plsRglm model and can easily be applied to new data. I implemented the polynomial model in the script AFS_wine_quality_model_function.R. At this point, no ensemble method is applied.
